Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains

نویسندگان

  • David Abel
  • Alekh Agarwal
  • Fernando Diaz
  • Akshay Krishnamurthy
  • Robert E. Schapire
چکیده

High-dimensional observations and complex realworld dynamics present major challenges in reinforcement learning for both function approximation and exploration. We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, nonparametric function approximator for learning on Q-function residuals. And second, we propose an exploration strategy inspired by the principles of state abstraction and information acquisition under uncertainty. We demonstrate the empirical effectiveness of these techniques, first, as a preliminary check, on two standard tasks (Blackjack and n-Chain), and then on two much larger and more realistic tasks with high-dimensional observation spaces. Specifically, we introduce two benchmarks built within the game Minecraft where the observations are pixel arrays of the agent’s visual field. A combination of our two algorithmic techniques performs competitively on the standard reinforcementlearning tasks while consistently and substantially outperforming baselines on the two tasks with highdimensional observation spaces. The new function approximator, exploration strategy, and evaluation benchmarks are each of independent interest in the pursuit of reinforcement-learning methods that scale to real-world domains.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Imitation Learning in Relational Domains via Transfer by Initialization

The problem of learning to mimic a human expert/teacher from training trajectories is called imitation learning. To make the process of teaching easier in this setting, we propose to employ transfer learning (where one learns on a source problem and transfers the knowledge to potentially more complex target problems). We consider multi-relational environments such as real-time strategy games an...

متن کامل

Accelarating Imitation Learning in Relational Domains via Transfer by Initialization

The problem of learning to mimic a human expert/teacher from training trajectories is called Imitation learning. To make the process of teaching easier in this setting, we propose to employ transfer learning (where one learns on a source problem and transfers the knowledge to potentially more complex target problems). We consider multi-relational environments such as real-time strategy games an...

متن کامل

An Evolutionary Feature Discovery Method for Reinforcement Learning

Using linear methods for reinforcement learning problems requires designing efficient features. However, designing features often requires having ample knowledge about the problem domain. When dealing with complex problem domains, coming up with efficient feature sets often requires a trial and error process which can prove difficult or inefficient. We present an evolutionary algorithm for gene...

متن کامل

PolicyBoost: Functional Policy Gradient with Ranking-Based Reward Objective

Learning policies in nonlinear representations is an important step toward real-world applications of reinforcement learning in robotics. While functional representation has been widely applied in state-of-the-art supervised learning techniques (as known as boosting approaches) to adaptively learn nonlinear functions, in reinforcement learning the boosting-style approaches have been little inve...

متن کامل

Comparing Deep Reinforcement Learning and Evolutionary Methods in Continuous Control

Reinforcement learning and evolutionary strategy are two major approaches in addressing complicated control problems. Both have strong biological basis and there have been recently many advanced techniques in both domains. In this paper, we present a thorough comparison between the state of the art techniques in both domains in complex continuous control tasks. We also formulate the parallelize...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1603.04119  شماره 

صفحات  -

تاریخ انتشار 2016